智能论文笔记

Cloud-Device Collaborative Adaptation to Continual Changing Environments in the Real-world

Yulu Gan , Mingjie Pan , Rongyu Zhang , Zijian Ling , Lingran Zhao , Jiaming Liu , Shanghang Zhang

分类：计算机视觉

2022-12-02

When facing changing environments in the real world, the lightweight model on client devices suffers from severe performance drops under distribution shifts. The main limitations of the existing device model lie in (1) unable to update due to the computation limit of the device, (2) the limited generalization ability of the lightweight model. Meanwhile, recent large models have shown strong generalization capability on the cloud while they can not be deployed on client devices due to poor computation constraints. To enable the device model to deal with changing environments, we propose a new learning paradigm of Cloud-Device Collaborative Continual Adaptation, which encourages collaboration between cloud and device and improves the generalization of the device model. Based on this paradigm, we further propose an Uncertainty-based Visual Prompt Adapted (U-VPA) teacher-student model to transfer the generalization capability of the large model on the cloud to the device model. Specifically, we first design the Uncertainty Guided Sampling (UGS) to screen out challenging data continuously and transmit the most out-of-distribution samples from the device to the cloud. Then we propose a Visual Prompt Learning Strategy with Uncertainty guided updating (VPLU) to specifically deal with the selected samples with more distribution shifts. We transmit the visual prompts to the device and concatenate them with the incoming data to pull the device testing distribution closer to the cloud training distribution. We conduct extensive experiments on two object detection datasets with continually changing environments. Our proposed U-VPA teacher-student framework outperforms previous state-of-the-art test time adaptation and device-cloud collaboration methods. The code and datasets will be released.

translated by 谷歌翻译

Delving into Effective Gradient Matching for Dataset Condensation

Zixuan Jiang , Jiaqi Gu , Mingjie Liu , David Z. Pan

分类：机器学习 | 计算机视觉

2022-07-30

随着深度学习模型和数据集的迅速扩展，网络培训非常耗时和资源成本。使用小型合成数据集学习并没有在整个数据集中进行培训，而是一种有效的解决方案。广泛的研究已在数据集凝结的方向上进行了探索，其中梯度匹配可以达到最先进的性能。梯度匹配方法在原始和合成数据集上训练时通过匹配梯度直接靶向训练动力学。但是，对该方法的原理和有效性进行了有限的深入研究。在这项工作中，我们从全面的角度深入研究了梯度匹配方法，并回答了什么，如何和何处的关键问题。我们建议将多级梯度匹配，以涉及类内和类间梯度信息。我们证明，距离函数应集中在角度上，考虑到同时延迟过度拟合的幅度。还提出了一种过度拟合的自适应学习步骤策略，以修剪不必要的优化步骤，以提高算法效率。消融和比较实验表明，与先前的工作相比，我们提出的方法具有优越的准确性，效率和概括性。

translated by 谷歌翻译

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

Wei Shi , Hanrui Wang , Jiaqi Gu , Mingjie Liu , David Pan , Song Han , Nan Sun

分类：人工智能 | 机器学习

2022-07-13

模拟/混合信号电路设计是整个芯片设计过程中最复杂，最耗时的阶段之一。由于芯片制造的各种过程，电压和温度（PVT）变化，模拟电路不可避免地会遭受性能降解。尽管在典型条件下自动化模拟电路设计方面已经有很多工作，但在探索在真实且不可预测的硅变化下探索可靠设计的研究有限。针对变化的自动模拟设计需要过度的计算和时间成本。为了应对挑战，我们提出了RobustanAlog，这是一个强大的电路设计框架，涉及优化过程中的变化信息。具体而言，不同变化下的电路优化被认为是一组任务。任务之间的相似之处是杠杆作用，并且可以缓解竞争以实现样本效率高的多任务培训。此外，Robustanalog根据每次迭代中当前的性能来修剪任务空间，从而导致进一步的模拟成本降低。这样，鲁棒可以迅速产生一组电路参数，这些电路参数满足各种变化的各种约束（例如增益，带宽，噪声...）。我们将Robustanalog与贝叶斯优化，进化算法和深层确定性策略梯度（DDPG）进行了比较，并证明Robustanalog可以将所需的优化时间显着减少14-30次。因此，我们的研究提供了一种处理各种真实硅条件的可行方法。

translated by 谷歌翻译

ELight: Enabling Efficient Photonic In-Memory Neurocomputing with Life Enhancement

Hanqing Zhu , Jiaqi Gu , Chenghao Feng , Mingjie Liu , Zixuan Jiang , Ray T. Chen , David Z. Pan

分类：机器学习

2021-12-15

随着最近光学相变材料（PCM）的进步，光子内存中的神经科学大量已经证明了其在光学神经网络（ONN）设计中的优越性，具有接近零静态功耗，光时间延迟和紧凑的占地面积。然而，光子张量核心需要大量硬件重用来实现由于单核刻度有限的矩阵乘法。由此产生的大量PCM写入，导致严重的动态功率和压倒性的PCM，具有有限的写入耐久性。在这项工作中，我们提出了一种协同优化框架，努力，以最大限度地减少高效且可靠的光学内记忆中的整体写作工作。我们首先提出了写知感知培训，以鼓励重量块之间的相似性，并将其与训练后的优化方法相结合，以通过消除冗余写入来减少编程工作。实验表明，突出可以在具有可比性准确度的写入总数和动态功率的总数超过20倍。通过我们的努力，光子内记忆中的内蒙古大量将向机器学习中的可行应用前进，具有保存的准确性，级别更长的寿命和更低的编程能量。

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译

Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge

Longxu Dou , Yan Gao , Xuqi Liu , Mingyang Pan , Dingzirui Wang , Wanxiang Che , Dechen Zhan , Min-Yen Kan , Jian-Guang Lou

分类：自然语言处理

2023-01-03

In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.

translated by 谷歌翻译

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

Jiahao Zhu , Daizong Liu , Pan Zhou , Xing Di , Yu Cheng , Song Yang , Wenzheng Xu , Zichuan Xu , Yao Wan , Lichao Sun

分类：计算机视觉

2023-01-02

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译

MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

Zhuo Chen , Jiaoyan Chen , Wen Zhang , Lingbing Guo , Yin Fang , Yufeng Huang , Yuxia Geng , Jeff Z. Pan , Wenting Song , Huajun Chen

分类：人工智能 | 自然语言处理

2022-12-29

As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.

translated by 谷歌翻译

SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering

Mingchen Li , Liqi Kang , Yi Xiong , Yu Guang Wang , Guisheng Fan , Pan Tan , Liang Hong

分类：机器学习

2022-12-29

Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (<50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.

translated by 谷歌翻译